fine  FILE  COPY  AD  A 1094  7  9 


ON  SE6HENTATI0N  OF  TINE  SERIES 

by 

STANLEY  L.  SCLDVE 
University  of  Illinois  at  Chicago  Circle 


Invited  Paper,  Special  Session  on  Cluster  Analysis,  789th  Meeting, 
Anerican  Mathematical  Society,  University  of  Massachusetts, 
Amherst,  MA,  October  16-18,  1981 


TECHNICAL  REPORT  NO.  81-1 
November  30,  1981 

PREPARED  FOR  THE 
OFFICE  OF  NAVAL  RESEARCH 
UNDER 

CONTRACT  NOOOI 4-80-C-0408, 

TASK  NR042-443 

with  the  University  of  Illinois  at  Chicago  Circle 
Principal  Investigator:  Stanley  L.  Sclove 


E 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  MATHEMATICS 
UNIVERSITY  OF  ILLINOIS  AT  CHICAGO  CIRCLE 
CHICAGO,  ILLINOIS  60680 


/ 


t-7 


ON  SE6MENTATI0N  OF  T 1 HE  SERIES* 


Stanley  L.  Sdove 

University  of  Illinois  at  Chicago  Circle 

CONTENTS 

Abstract;  Key  Uords  I  Phrases 

1.  Introduction 

2.  The  Model 

3.  An  Algorithm 

3.1.  Developnent  of  the  algorithm 

3.2.  The  first  iteration 

3.3.  Restrictions  on  the  transitions 

A.  An  Example 

4.1.  Fitting  the  model 

4.2.  Choice  of  nunber  of  classes 

4.3.  Prediction 

Acknowledgements 
References 


•Invited  Paper,  Special  Session  on  Cluster  Analysis,  789th  Meeting 
American  Mathematical  Society,  University  of  Massachusetts, 
Amherst,  HA,  October  16*18,  1981 


-11- 


ON  SEGMENTATION  OF  TINE  SERIES* 
Stanley  L.  Sc  love 

University  of  Illinois  at  Chicago  Circle 


ABSTRACT 

The  problem  of  partitioning  a  tine-series  into  segments  is 
considered.  The  segnents  fall  into  classes,  uhich  nay  correspond  to 
phases  of  a  cycle  (recession,  recovery,  expansion  in  the  business 
cycle)  or  to  portions  of  a  signal  obtained  by  scanning  (background/ 
clutter,  target,  background/clutter  again,  another  target,  etc.;  or 
nornal  tissue,  tunor,  normal  tissue).  Parametric  families  of 
distributions  are  considered,  a  set  of  parameter  values  being 
associated  with  each  class.  Uith  each  observation  it  associated  an 
unobservable  label,  indicating  fron  uhich  class  the  observation 
arose.  The  label  process  is  modeled  as  a  Harkov  chain. 

Segmentation  algorithms  are  obtained  by  applying  a  method  of 
iterated  maximum  likelihood  to  the  resulting  likelihood  function. 

In  this  paper  special  attention  is  given  to  the  situation  in  uhich 
the  observations  are  conditionally  independent,  given  the  labels. 

A  numerical  example  is  given.  Choice  of  the  number  of  classes,  using 
Akaike's  automatic  (model)  identification  criterion  (A1C),  is 
illustrated.  Prediction  is  considered. 


Key  Uords  3  Phrases:  forecasting;  prediction;  signal  analysis; 
isodata  procedure;  Harkov  chains;  maximum  likelihood;  Akaike's 
automatic  (model)  identification  ciUerion  (AIC). 
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ON  SEGMENTATION  OF  TIME  SERIES* 

Stanley  L.  Sclove 

University  of  Illinois  at  Chicago  Circle 

1.  Introduction 

The  problem  of  "segmentation"  considered  here  is:  Given  a  time  series 
{xt,  t«*l,2, . . . ,n},  partition  the  set  of  values  of  t  into  sub-series  (segments, 
regimes)  which  are  relatively  homogeneous.  The  segments  are  assumed  to  fall 
into  several  classes.  In  processes  which  may  be  considered  as  cycles  the 
classes  are  phases  of  the  cycle. 

Examples .  (i)  Segment  an  economic  time  series  into  periods  of  recession, 

recovery,  and  expansion.  Here  there  are  three  classes  of  segment,  (ii)  Seg¬ 
ment  an  electrocardiogram  into  rhythmic  and  arhythmic  periods  (two  classes 
of  segment).  (iii)  Segment  an  electroencephalogram  of  a  sleeping  person 
into  periods  of  deep  sleep  and  restless  or  fitful  sleep  (two  classes  of 
segment),  (iv)  Segment  a  received  signal  into  segments  of  background, 
target,  background  again,  another  target,  etc. 

The  observation  X  may  be  a  scalar,  vector,  or  matrix — any  element  of 
a  linear  space,  for  which  the  operations  of  addition  and  scalar  multiplica¬ 
tion  are  defined.  (One  needs  to  perform  such  operations  as  x£  -  cxt_^, 
where  c  is  a  scalar.) 

In  some  applications  the  definition  of  the  classes  involves  the  values 
of  the  observed  x;  in  others,  their  definition  may  be  logically  independent 
of  the  value-space  of  X.  In  the  former  case  the  classes  may  be  viewed  simply 
as  a  partition  of  the  value-space  of  X. 
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2.  The  Model 

One  can  imagine  a  series  which  is  usually  relatively  smooth  but  occa¬ 
sionally  rahter  jumpy  as  being  composed  of  sub-series  which  are  first-order 
autoregressive  [AR(l)],  the  autocorrelation  coefficient  4>  being  positive 
for  the  smooth  segments  and  negative  for  the  jumpy  ones.  In  a  simple  case 
one  might  try  fitting  a  segmentation  with  two  classes  given  by  AR(1;4>  ) 
and  ARd;^),  where  one  of  the  4>’s  is  positive  and  the  other  is  negative. 

The  mechanism  generating  the  process  changes  from  time  to  time,  and 
these  changes  manifest  themselves  at  some  unknown  time  points  (epochs) 


t,  ,  r„,...,  r  , ;  that  is,  there  are  m  segments.  The  integer  m  and  the 
l  2.  m— i. 

epochs  x  ,  g=l,2, . . . ,m-l,  are  unknown.  Generally  there  will  be  fewer  than 
m  generating  mechanisms.  The  number  of  mechanisms  (classes)  will  be  denoted 


by  k;  it  will  be  assumed  that  k  is  at  most  m.  In  some  situations,  k  is 


specified;  in  others,  it  is  not.  With  the  c-th  class  is  associated  a 
stochastic  process,  P^,  say.  For  example,  above  we  spoke  of  a  situation 
with  k=2  classes,  where,  for  c»l,2,  the  process  Pc  is  AR(l;4>c). 

Now  with  the  t-th  observation  (t=l,2, . . . ,n)  associate  the  label  y  , 
which  is  equal  to  c  if  and  only  if  xt  arose  from  class  c,  c=l,2,,..,k.  Each 
time-point  t  gives  rise  to  a  pair  (xt,Yt),  where  xt  is  observable  and  Yt  Is 
not.  The  process  (xt>  is  the  observed  time  series,  and  {yc}  will  be  called 
the  label  process. 

Define  a  segmentation,  then,  as  a  partition  of  the  time  index  set 

(t:  t*l,2,...,n)  into  subsets  ■  {1,2, . . . ,t^},  S2  * ' (tj+1, . . . , 1 2 J ,  ..., 

S  *  (t  ,+  l,  ...,n},  where  t,<t„<. .  .<t  ,<t  “n.  Each  subset  S  is  a 

m  m-1  12  m-1  m  g 

segment .  The  integer  m  is  not  specified.  In  the  context  of  this  model, 
to  segment  the  series  is  merely  to  estimate  the  y's. 
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The  idea  underlying  Che  development  In  the  present  paper  is  that  of 

transitions  between  classes.  The  labels  will  be  treated  as  random  variables 

T  with  transition  probabilities  Pr(Tt»d | rt_1*c)  =  pcd>  taken  as  stationary, 

i.e.,  independent  of  t.  The  matrix  of  transition  probabilities  will  be  denoted 

by  P,  that  is,  P  «  IPcd]c=i,2,...,k  * 

d=l ,  2 ,  —  ,k 

If  a  process  is  to  be  strictly  cyclic,  like  intake,  compression,  combustion 
for  a  combustion  engine,  or  recession  to  recovery  to  expansion  to  recession,  etc., 
in  the  business  cycle,  then  this  this  condition  can  be  imposed  by  using  a 
transition  probability  matrix  such  as  the  following,  with  zeros  in  the  appro¬ 
priate  places. 

Label  at  time  t 


1: 

Recession 

2: 

Expansion 

3: 

Recovery 

Label  at 

1: 

Recession 

P11 

'12 

P13-0 

time  t-1 

2: 

Recovery 

P2l“° 

P22 

P23 

3: 

Expansion 

p31 

p32-° 

P33 

Later  we  shall  consider  a  matrix  like  this  but  with  different  restrictions; 
namely,  we  shall  allow  transitions  only  to  adjacent  states  (classes).  See 
Section  4.2. 

Segmentation  will  involve  the  simultaneous  estimation  of  the  parameters 
of  the  stochastic  processes  Pc,  c*l,2,...,k,  the  transition  probability  matrix 
P,  and  the  labels  {yt,  t=l,2, . . . ,n). 

A  joint  probability  density  function  (p.d.f.)  for  {(Xt»T  ),  t-l,2,..,,n} 
is,  using  f  as  a  generic  symbol  for  any  p.d.f.,  and  successively  conditioning 
each  of  T^,  X^,  X X^,  ...,  Xfl  on  all  preceding  X’x  and  F's, 
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f  (y1)f  (xx  |  Y-^TT  f  (Yt  I  xt-i»Yt_1 .  •  •  •  .Y^  f  (xt  |  Yt  »xt-l » Yt-1 » •  •  *  »Yi>  • 


(2.1) 


The  working  assumptions  of  this  paper  are  the  following. 

(A. 1)  The  Yt  are  a  first-order  stationary  Markov  chain,  independent  of  the  x’s: 


f(Yt|xt,Yt_1,...,x1>Y1)  =  P 


Yt-lYt 


(2.2) 


(A. 2)  The  random  variable  depends  upon  the  past  only  through  its  own  label 


and  through  previous  X’s,  not  through  previous  labels: 


f  (xt|y  »xt_1,Yt_1»*.  •  .x^Yj.) 


With  these  assumptions  (2.1)  becomes 
n 


f(xtlYt’xt-i . xi) 


f (Yx)f (xJy^TT  P 


f (x  |Y  .X  . x  )  . 

t=2  Yt-lYt  C  1 


(2.3) 


(2.4) 


Note  that  this  is 
k  k  n 


TT  TT  pcj  f(Y.)f (x1|y1)  FT  f(xtlYt’xt_i . xi)» 

c=l  d=l  t=2 

where  n  ,  -  number  of  transitions  from  class  c  to  class  d  (unobservable) . 


(2.5) 


cd 


This  model,  with  transition  probabilities,  has  certain  advantages  over 
a  model  which  uses  only  the  epochs  (change-points) .  The  epochs  are  discrete 
parameters,  and,  even  if  the  corresponding  generalized  likelihood  ratio  were 
asymptotically  chi-square,  the  number  of  degrees  of  freedom  would  not  be 
clear.  On  the  other  hand,  the  transition  probabilities  vary  in  an  interval 
and  it  is  clear  that  they  constitute  a  set  of  k(k-l)  free  parameters. 

Examples,  (i)  If  each  class-conditional  process  P 1  is  a  first-order 


Markov  process,  then 

f  (xt|Yt,xt_1 . x^ 


(2.6) 


(ii)  If  in  addition  the  c-th  class-conditional  process  is  Gaussian  first-order 

autoregressive  with  autoregression  coefficient  $  and  constant  term  6  ,  with 

c  c 

2 

common  variance  o  ,  then  (2.6)  holds  with 


f(xtlYt«c,xt-i) 


<2*o2)"1/2exp[-u^c/(2o2)], 
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where 


tc 


xt  “  (Vt-l  +  V 


.g.,  the  value  of  the  likelihood  for  y,  ■  1  *  yn  =  ...  =  y  and  y 

1  L  m 


m+1 


'm+2 


Y  is,  for  given  x, , 
n  1 


m-1  n-m-2,,,  2.-(n-l)/2  ,  2.. 

P11  P12P22  (2lTa  *  exp[-q/(2o  )]  , 

where 

q  =  £  [xt  -  (4>1xt_1+  6  )]2  +  l  [x£  -  (4>2xt;_i  +  ^2)]2. 

t~2  t=nrH  t 

In  the  simplest  case  the  X's  are  (conditionally)  independent,  given  the  y's. 

Then  ^(xtlTt>xt_^» — *  f^xtJYt^‘  We  shall  pay  special  attention  to 
this  case  in  the  present  paper.  The  p.d.f.'s  f(x|Yt=c),  c=l,2,...,k,  are  called 
the  class-conditional  densities.  In  the  parametric  case  the  class-conditional 
density  takes  the  form 

f (xt I Y t“c)  ’  g(xt;Bc),  (2.7) 

where  8  is  a  parameter  indexing  a  family  of  p.d.f.’s  of  form  given  by  g. 


3.  An  Algorithm 

3.1.  Development  of  the  algorithm 

The  likelihood  L  is  (2.4)  or  (2.5),  considered  as  a  function  of  the 
parameters,  for  fixed  (xt>.  From  (2.4),  (2.5),  and  (2.7),  the  likelihood  L 
can  be  written  in  the  form 

L  =  A({p  .»B({y,MB  ».  (3.1) 

cd  t  c 

Hence,  for  fixed  values  of  the  y's  and  B's,  L  is  maximized  with  respect  to  the 

k  k  n  . 

p's  by  maximizing  factor  A.  But  A  *  ]~~f  |  f  p  j  .  The  n  ,  are  specified 

c-1  d-1  Cd  cd 

by  the  y's.  So  from  the  usual  multinomial  model,  it  follows  that  the  maximum 
likelihood  estimates  of  the  p's,  for  fixed  values  of  the  other  parameters, 
are  given  by 
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cd 


where 


n  j/n  • 

cd  c 


(3.2) 


nc  "  ncl  +  nc2  +  +  nck- 

Further,  given  the  p’s  and  y’s,  the  estimates  of  the  distributional  parameters 

— the  g’s — are  easy  to  obtain.  This  suggests  the  following  algorithm. 

Step  0.  Set  the  (3's  at  initial  trial  values.  Set  the  p’s  at  intial  trial 

values.  Set  f(y^)  at  initial  trial  values,  e.g.,  f(y^)  =  1/k,  for  y^»l,2,. . . ,k. 

Step  1.  Estimate  y  by  maximizing  f (y^) f (x^ | y^) . 

Step  2.  For  t=2,3,...,n,  estimate  y  by  maximizing 

Pv  v  f(x  |y  ,x  ,...,x  ). 
i—l  i  t  c  c  i 


Step  3.  Now,  having  labeled  the  observations,  estimate  the  distributional 

parameters,  and  estimate  the  transition  probabilities  by  (3.2), 

Step  4.  If  no  observations  has  changed  labels  from  the  previous  iteration, 

stop.  Otherwise,  repeat  the  procedure  from  Step  1. 

Step  2  is  Bayesian  classification  of  xt ,  with  prior  probabilities 

p  .  Hence  all  the  techniques  for  classification  in  particular  models 

Yt-lYt 

are  available  (e.g.,  use  of  linear  discriminant  functions  when  the  observations 
are  multivariate  normal  with  common  covariance  matrix) . 

3.2.  The  first  iteration 

When  the  k  class-conditional  processes  consist  of  independent,  identically 
distributed  normally  distributed  random  variables  with  common  variance,  one 
can  start  by  choosing  initial  means  and  labelling  the  observations  by  a  minimum- 
distance  clustering  procedure.  [This  is  one  iteration  of  ISODATA  (Ball  and  Hall, 
1967).  One  could  iterate  further  at  this  stage.]  From  this  clustering 


initial  estiamtes  of  transition  probabilities  and  the  variance  are  obtained. 
This  starting  procedure  could  also  be  used  for  fitting  AR  models  by  taking  the 
initial  triar values  of  the  autoregression  coefficients  as  zero. 
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3.3.  Restrictions  on  the  transitions 

As  mentioned  above,  one  might  wish  to  place  restrictions  on  the  transitions, 
e.g.,  to  allow  transitions  only  to  adjacent  states.  The  model  does  permit 
restrictions  on  the  transitions,  as  discussed  above.  The  maximization  is 
conducted,  subject  to  the  condition  that  the  corresponding  transition  probabili¬ 
ties  are  zero.  This  is  easily  implemented  in  the  algorithm.  Once  one  sets 
a  given  transition  probability  at  zero,  the  algorithm  will  fit  no  such  tran¬ 
sitions,  and  the  corresponding  transition  probability  will  remain  zero  at 
every  iteration. 

4.  An  Example 

Here,  for  a  specific  numerical  example,  the  problems  of  fitting  the  model 
for  a  fixed  k,  choice  of  k,  and  prediction  will  be  discussed. 

Quarterly  gross  national  product  (GNP)  in  current  (non-constant)  dollars 
for  the  twenty  years  1947  to  1966  was  considered.  (This  makes  a  good  size 
dataset  for  the  current  exposition.)  Parameters  were  estimated  from  the  first 
19  years,  the  last  four  observations  (1966)  being  saved  to  test  the  accuracy 
of  predictions.  (See  Section  4.3.)  The  data  and  first  differences  are  given 
in  Table  1.  The  raw  series  is  nonstationary,  so  the  first  differences 
(increases  in  quarterly  GNP)  were  analyzed.  The  notation  is 

xt  -  GNPt+1-  GNPt,  t  -  1,2,. ..,79; 

e.g.,  GNP^  is  the  GNP  at  the  end  of  the  quarter  1947-1,  GNP  2  is  that  at  the 
end  of  1947-2,  and  x^  *  GNP  2  -  GNP^  is  the  increase  in  GNP  during  the  second 
quarter  of  1947.  (A  negative  value  of  an  x  indicates  a  decrease  in  GNP 
for  the  corresponding  quarter.)  A  Gaussian  model  was  used. 
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4.1.  Fitting  the  model 

In  this  section  we  discuss  the  fitting  of  a  model  with  k=3  classes, 

discussion  of  the  choice  between  alternative  models  being  deferred  to  the 

next  section.  The  three  classes  may  be  taken  as  corresponding  to  recession, 

recovery,  and  expansion,  although  some  may  prefer  to  think  of  the  segments 

labeled  as  recovery  as  level  periods  corresponding  to  peaks  and  troughs. 

The  approximate  maximum  likelihood  solution  found  by  the  iterative  procedure 

1/2 

was  =  -1.3,  =  6.2,  fl3  =  12.3,  8  =  5.194  =  2.28  (the  units  are 

billions  of  current  (non-constant)  dollars)  and 

7625  .250  .125 

.156  .625  .219 

^039  .269  .692^  . 

The  estimated  labels  are  given  below;  labels  (r=recession,  e=expansion) 

resulting  from  fitting  k=2  classes  (see  below)  are  also  given. 

t:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 

label,  k=3:  22322211111333332222123 

label,  k=2:  rreeeerrrrreeeeeeeerree 

24  25  26  27  28  29  30  3]  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50 

221111223222222222211233331 

errrrrreeeeerrreererrreeeer 

51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 

2321113333322223333323333 

eerrrreeeeeeeeeeeeeeeeeee 

The  process  was  in  state  1  for  21%  of  the  time,  in  state  2  for  44%  of  the  time, 

and  state  3  for  35%  of  the  time. 

The  conventional  wisdom  regarding  recessions  during  the  period  of  time 
covered  by  these  data  includes  the  following,  [See,  e.g,,  Mansfield  (1974), 
pp.  209-211.]  In  1948-1949  (t“4  to  11)  there  was  a  reduction  of  inventory 
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investment.  In  1953-1954  (t“24  to  31)  there  was  a  reduction  in  government 
expenditures  when  the  Korean  conflict  came  to  a  close.  In  mid-1957  to  late 
1958  (t=42  to  45)  an  ongoing  recession  was  aggravated  by  a  drop  in  defense 
expenditures  in  late  1957.  In  1960  (t=52  to  55)  monetary  and  fiscal  authori¬ 
ties  had  put  on  the  brakes;  interest  rates  had  risen  substantially  during 
1958  and  1959. 

An  interesting  feature  of  the  model  and  the  algorithm  is  that,  as  the 
iterations  proceed,  some  isolated  labels  change  to  conform  to  their  neighbors. 
This  should  be  the  case  when  Pcc  is  large  relative  to  p  ,  d=l,2,...,k,  d  4  c. 

It  is  customary  to  fit  an  ARI(1,1)  model  to  such  data.  [See,  e.g.. 

Nelson  (1973),  pp.  64-65.]  Hence  AR(l)'s  were  fit  within  segments  in  a 
preliminary  analysis  of  the  data.  One  might  expect  that  segmentation  might 
absorb  the  autocorrelation.  This  was  in  fact  found  to  be  the  case.  The  values 
of  the  estimated  first-order  autocorrelation  coefficients  were  not  significantly 
different  from  zero.  Thus  the  model  with  conditional  independence,  given  the 
labels,  was  used. 

4.2.  Choice  of  number  of  classes 

Various  values  of  k  were  tried,  the  results  being  compared  by  means  of 
Akaike's  Automatic  (model)  identification  criterion  (AIC) .  [See.  e,g., 

Akaike  (1981) . ]  The  AIC  for  a  given  model  is 

AIC  =  -2  1ogeL  +  2p , 

where  L  is  the  maximized  value  of  the  likelihood  and  p  is  the  number 
of  parameters  in  the  model.  According  to  AIC,  inclusion  of  an  additional 
parameter  in  a  model  is  appropriate  if  log^L  increases  by  one  unit  or  more, 
i.e.,  if  L  increases  by  a  factor  of  e  or  more. 
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The  model  was  fit  with  several  values  of  k  and  unrestricted  transition 
probabilities.  Also,  since  it  seems  reasonable  to  restrict  the  transitions 
to  those  between  adjacent  states,  these  models  were  evaluated  as  weel.  In  the 
case  of  k=3,  where  the  states  might  be  considered  as  recession,  expansion, 
and  recovery,  this  means  setting  equal  to  zero  the  transition  probabilities 
corresponding  to  the  transitions,  recession-to-expansion  and  expansion-to- 
recession.  Also,  by  way  of  comparison,  the  ARI(1,1)  model 
xt  =  +  5  +  U(;,  xt  =  GNPt+1-  GNPt, 


was  fit.  The  IID  model  of  independent  and  identically  distributed  Gaussian 
observations  was  fit  also,  just  for  comparison.  The  results  are  given  in 
Table  2.  The  best  segmentation  model,  as  indicated  by  minimum  ATC,  is  that 
with  only  two  classes.  [The  AIC  for  AR(1,1)  was  even  lower,]  The  AIC  for 
the  IID  model  was  quite  large,  indicating  a  very  poor  fit,  as  would  be 
expected. 

The  results  for  k-2  classes  (which  might  be  labeled  recession,  expansion) 
were  =  0.43,  p2  =  10.09,  a  •  3.306,  and 
f  667  . 333"~| 


P  = 


.170  .830 


The  process  was  in  state  1  for  37%  of  the  time  and  class  2  the  other  63%  of 
the  time.  The  labels  were  given  above. 

A  model  with  only  two  classes  enjoys  advantages  relating  to  its  relative 


simplicity. 
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4,3.  Prediction 

If  there  Is  feedback.  In  the  sense  that  becomes  known  before  xt+1 
is  to  be  predicted,  then,  given  Yt*c,  one  can  give  the  prediction 

x_, |y  =c  =  y,  with  probability  p  _ 

t"Tl  t  1  Cl 

A  A 

=  y^  with  probability  p  ^ 


with  probability  p  ^ 

In  this  example  this  gives  rise  to  a  "recession  probability,"  p  ,  reminiscent 

cl 

of  the  "precipitation  probability"  of  meteorology. 

Similarly,  one  has 

xt+lJYt  c  with  probability 

=  y ^  with  probability 


;(h) 

Pcl 

;(h) 

Pc2 


’  •**  A  /  1  \ 

=  y^  with  probability  p^  , 


:(M 


where  p  ^  is  the  natural  estimate  of  the  k-step  c-to-d  transition  probability, 

the  c,d-th  element  of  the  h-th  power  of  P. 

These  are  vector  estimates,  with  probabilities  attached  to  the  elements 

c  ~ ( hi  A 

of  the  vector.  A  scalar  estimate  is  given  by  )  p  ^  y  for  any  h  =  1,2,...  . 

cd  d 


d=l 


Now  let  us  consider  prediction  based  on  the  model  with  k=3  classes,  fit  in 
Section  4.1.  We  predict  x^,  x^,  x^g,  and  x^.  Consider  first  the  prediction 
of  If,  before  it  had  to  be  predicted,  one  had  been  sure,  due  to  the 


accumulation  of  information  on  various  economic  indicators,  that  the  process 
had  then  been  in  an  expansion  (state  3),  then  the  relevant  estimated  transition 
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probabilities  would  be  .039,  .269,  and  .692,  for  transitions  from  state  3 
to  states  1,  2,  and  3,  respectively.  One  would  make  the  prediction 

“  _1»3  ±  2.338  with  probability  .039 

7o  7  j 

-  6.2  ±  2.318  with  probability  .269 
*  12.3  ±  2.321  with  probability  .692, 

where  the  numbers  after  ±  are  approximate  standard  errors  of  prediction,  namely 
2  1/2 

[a  (1  +  1/n  )]  ,  c  *  1,2,3,  or,  since  the  numbers  of  observations  assigned 

1/2 

to  the  three  groups  were  n^  -  19,  =  29,  and  n^  =  27,  [5.194(1  +  1/19)] 

1/2  1/2 
=  2.338,  [5.194(1  +  1/29)]'  =  2.318,  and  [5.194(1  +  l/27)]i/  =  2,321. 

The  actual  value  of  was  19.5,  a  very  strong  gain  in  GNP  for  that  period, 

certainly  consistent  with  a  prediction  of  "expansion,"  The  values  of  | _^=  3 

t  =  77,  78,  79,  are  the  same  as  those  for  x^|y^=3. 

The  difference  19.5  -  12.3  =  7.2  is  rather  large.  However,  the  fitted 
ARI(1,1)  model  xt+^  =  0.597xt  +  2.64,  with  o  *  4.95,  also  made  a  large  error 
for  this  quarter.  It  gave  a  prediction  of  13.9,  with  an  approximate  standard 
error  of  prediction  of  4.95.  Its  successive  forecasts  for  the  last  three 
quarters  represented  In  the  data  set,  in  each  case  using  the  observation  from 
the  previous  quarter,  were  14.3,  10.9,  and  10.2,  each  with  standard  error  of 
prediction  equal  to  4.95,  compared  with  actual  results  of  13.8,  12.6,  and  14.8, 
respectively. 

Now  let  us  consider  prediction  more  than  one  period  ahead.  Given 
information  for  t=75,  we  predict  x^,  x^g,  and  x^.  Using  the  third  row 
of  the  second,  third  and  fourth  powers  of  one  finds 


-1.3  with  proh.  .093 
6.2  with  prob.  ,364 
12.3  with  prob.  .543, 


x78  ^ y75=3 

-1.3 

with  prob. 

.136 

6.2 

with  prob. 

.397 

= 

12.3 

with  prob. 

.467, 

x79Iy75=3 

-1.3 

with  prob. 

.165 

a 

6.2 

with  prob. 

.408 

ss 

12.3 

with  prob. 

.427. 

We  have 

lin^^  xt+hlYt"c  =  -1.3  with  prob.  .211 

=  6.2  with  prob.  .411  (4.1) 

=  12.3  with  prob.  .378, 

independent  of  c  and  t,  because  (.211,  .411,  .378)  is  the  estimated  long-run 
distribution  across  the  states.  The  predictions  of  ARI(1,1)  are 


X77 

|x75  = 

10.9, 

std. 

err.  =  5.76, 

X  > 

oo 

lX75  = 

9.2, 

std. 

err.  =  6.03, 

X79! 

lX75  " 

8.1, 

std. 

err.  =  6.12. 

By  way  of  comparison  with  (4.1),  in  the  long  run,  these  forecasts  from  ARI(1,1) 
tend  to  6.55,  the  estimated  mean  of  the  process,  with  an  estimated  standard 
error  of  6.167,  the  estimated  standard  deviation  of  the  x's. 
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Table  1.  GNP.  Units:  billions  of  current  (non-constant)  dollars 
(from  Nelson  (1973),  pp.  100-101) 


Quarter 

12341234 


1947-48  GNP 

224 

228 

232 

242 

248 

256 

263 

264 

AGNP 

4.0 

4.2 

10.3 

5.9 

7.6 

6.9 

1.4 

-5.4 

1949-50  GNP 

^59 

255 

257 

255 

266 

275 

293 

305 

AGNP 

-3.3 

1.9 

-2.1 

11.0 

9.4 

17.7 

11.4 

13.5 

1951-52  GNP 

318 

326 

333 

337 

340 

339 

346 

358 

AGNP 

7.8 

7.0 

4.1 

2.6 

-0.4 

6.5 

12.1 

6.5 

1953-54  GNP 

364 

368 

366 

361 

361 

360 

365 

373 

AGNP 

3.3 

-1.7 

-5.0 

-0.1 

-0.3 

4.3 

8,7 

12.8 

1955-56  GNP 

386 

394 

403 

409 

411 

416 

421 

430 

AGNP 

8.2 

8.1 

6.3 

1.8 

5.6 

4.4 

8.9 

7.4 

1957-58  GNP 

437 

440 

446 

442 

435 

438 

451 

464 

AGNP 

3.0 

6.4 

-4.8 

—6.8 

3.6 

13.1 

13.0 

9.6 

1959-60  GNP 

474 

487 

484 

491 

503 

505 

504 

503 

AGNP 

12.9 

-2.9 

6.5 

12.5 

1.7 

-0.5 

-0.9 

0.3 

1961-62  GNP 

504 

515 

524 

538 

548 

557 

564 

572 

AGNP 

11.3 

9.3 

13.5 

10.1 

9.4 

7.2 

7.6 

5.4 

1963-64  GNP 

577 

584 

595 

606 

618 

628 

639 

645 

AGNP 

6.8 

10.5 

11.1 

11.9 

10.3 

10.9 

6.2 

17.7 

1965-66  GNP 

663 

676 

691 

710 

730 

743 

756 

771 

12.9 

15.4 

18.9 

19.5 

13.8 

12.6 

14.8 
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Table  2.  Fitting  models.  (See  Section  4.2.) 


Model  AIC 


Segmentation, 

2 

classes 

481.4® 

Segmentation, 

3 

classes. 

full  trans.  prob. 

matrix 

483.6 

Segmentation, 

3 

classes. 

sparse  trans.  prob 

.  matrix*3 

488.5' 

Segmentation, 

4 

classes. 

full  trans.  prob. 

matrix 

507.1 

Segmentation, 

4 

classes. 

sparse  trans.  prob 

,  b 

.  matrix 

486.8 

Segmentation, 

5 

classes. 

full  trans.  prob. 

matrix 

506. 5+ 

Segmentation, 

5 

classes, 

sparse  trans.  prob 

.  matrix*3 

stopped 

Segmentation, 

6 

classes. 

full  trans.  prob. 

matrix 

stopped 

AR(l)d  453. 2e 

IIDf  1721.4 


a.  Optimum,  among  segmentation  models  considered. 

b.  Allows  transitions  only  to  adjacent  states. 

c.  Stopped,  i.e.,  the  algorithm  reached  an  iteration  where  it  allocated 
no  observations  to  one  of  the  classes. 

d.  AR(1)  for  the  differences,  i.e.,  ARI(1,1)  for  the  original  series. 

e.  Optimum,  among  all  models  considered. 

f.  Observations  treated  as  a  random  sample  from  a  normal  distribution. 
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,  continued) 

^  Parametric  families  of  distributions  are  considered,  a  set  of  parameter 
values  being  associated  with  each  class.  With  each  observation  is  associ¬ 
ated  an  unobservable  label,  indicating  from  which  class  the  observation 
arose.  The  label  process  is  modeled  as  a  Markov  chain.  Segmentation 
algorithms  are  obtained  by  applying  a«method  of  iterated  maximum  likelihood 
to  the  resulting  likelihood  function.  In  this  paper  special  attention 
is  given  to  the  situation  in  which  the  observations  are  conditionally 
independent,  given  the  labels.  A  numerical  example  is  given.  Choice 
of  the  number  of  classes,  using  Akaike's  automatic  (model)  identification 
criterion  (AIC) ,  is  illustrated.  Prediction  is  considered. 
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